巴西专利BR112019028132A2 computer system adapted to process 3d image data and its method, product of computer program

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
It is a computer-implemented method for processing 3D image data of a dentomaxillofacial structure, in which the method can comprise the steps of: receiving 3D image data that define a volume of voxels, in which a voxel is associated with a radiodensity value and a position in the volume and the voxels that provide a 3D representation of a dentomaxillofacial structure; using the voxels of the 3D image data to determine one or more 3D positional features for input to a first deep neural network, a 3D position feature that defines aggregated information from the entire 3D received data set; and, the first deep neural network that receives 3D image data and one or more positional features at its input and using one or more 3D positional features to classify at least part of the voxels of 3D image data into voxels of jaw, teeth and / or nerves.
公开号:BR112019028132A2
申请号:R112019028132-0
申请日:2018-07-02
公开日:2020-07-28
发明作者:Frank Theodorus Catharina Claessen；Bas Alexander Verheij；David Anssari Moin
申请人:Promaton Holding B.V；
IPC主号:

专利说明:

[001] [001] The invention relates to the classification and 3D modeling of 3D dentomaxillofacial structures, using deep learning neural networks, and in particular, although not exclusively, to systems and methods for the classification and 3D modeling of 3D dentomaxillofacial structures, using networks deep learning neural networks, a method of training deep learning neural networks, a method of preprocessing dentomaxillofacial 3D image data and a method of postprocessing voxel data classified from dentomaxillofacial structures and a computer program product to use this method. BACKGROUND OF THE INVENTION
[002] [002] In the image analysis of dentomaxillofacial structures, the visualization and reconstruction of the 3D image of specific parts or tissues are essential to allow accurate diagnoses and treatments. Prior to 3D image reconstruction, a classification and segmentation process is applied to 3D image data, for example, voxels, to form a 3D model of different parts (for example, teeth and jaw) of the dentomaxillofacial structure as represented in a pile 3D image data. The segmentation task can be defined as the identification of the set of pixels or voxels that make up the outline or interior of an object of interest. The process of segmenting dentomaxillofacial structures such as teeth, jaw and lower alveolar nerve in 3D CT scans is, however, a challenge. Manual segmentation methods are extremely time-consuming and include a general approach by manually selecting limits and manual corrections. The results of manual segmentation have low reproducibility and depend on the human interpretation of computed tomography.
[003] [003] Different imaging methods were used to generate 3D models of teeth and jaws based on computerized tomography image data. Initially, the sequential application of low-level pixel processing and mathematical modeling was used in order to segment dentomaxillofacial structures. An example is described in the article by Pavaloiu et al., “Automatic segmentation for 3D dental reconstruction”, IEEE 6th ICCCNT, July 13-15
[004] [004] These neural networks are trained to learn the features that represent the data in an ideal way. These deep learning algorithms include a multilayer deep neural network, which transforms input data (eg, images) into output (eg, present / absent disease) while learning increasingly higher resources. A successful model of neural network for image analysis is the so-called convolutional neural network (CNN). CNNs contain many layers that transform their input using kernels, also known as convolution filters, which consist of a matrix of relatively small size. An overview of the use of CNNs for medical imaging can be found in the article by Litjens et al., A Survey on Deep Learning in Medical Image Analysis, published on February 21, 2017 arXiv (submitted to Computer Vision and Pattern Recognition). 3D modeling of dentomaxillofacial structures, which use 3D CNNs, however, is difficult due to the complexity of dentomaxillofacial structures. Pavaloiu et al. described in his articles “Neural network based edge detection for CBCT segmentation”, 5th IEEE EHB, November 19-21, 2015, the use of a very simple neural network in the detection of edges in 2D CBCT images. So far, however, accurate automatic 3D segmentation of 3D CBCT image data based on deep learning has not been reported.
[005] [005] A problem in the classification and 3D modeling of dentomaxillofacial structures is that dentomaxillofacial images are generated using cone beam computed tomography (CBCT). CBCT is a medical imaging technique that uses X-ray computed tomography, in which the X-ray radiation is molded into a low-dose converging cone. The radio density measured in Hounsfield Units (HUs) is not reliable in CT scans, because different areas in the scan appear with different gray scales, depending on their relative positions in the organ that is being scanned. HUs measured in the same anatomical area, both with CBCT and medium grade CT scanners are not identical and, therefore, are not reliable for determining radiographic and site-specific bone density.
[006] [006] In addition, CBCT systems for scanning dentomaxillofacial structures do not employ a standardized system for scaling the gray levels that represent the reconstructed density values. These values are arbitrary and do not allow the evaluation of bone quality. In the absence of such standardization, it is difficult to interpret the gray levels or impossible to compare the values resulting from different machines. In addition, the roots of the teeth and the bone structure of the jaw have similar densities, so it is difficult for a computer to distinguish between voxels belonging to teeth and voxels belonging to a jaw. Additionally, CBCT systems are very sensitive to the so-called beam hardening, which produces dark streaks between two high-attenuation objects (such as metal or bone), with bright streaks around them. The problems mentioned above make the automatic segmentation of dentomaxillofacial structures particularly challenging.
[007] [007] Therefore, there is a need in the technique of computer systems that are adapted to accurately segment 3D image data from computed tomography of dentomaxillofacial structures in a 3D model. In particular, there is a need in the art for computer systems that can accurately segment 3D computed tomography image data from dentomaxillofacial structures originating from different CBCT systems in a 3D model. SUMMARY OF THE INVENTION
[008] [008] As will be seen by a person skilled in the art, aspects of the present invention can be incorporated as a computer program system, method or product. Consequently, aspects of the present invention can take the form of an entirely hardware modality, an entirely software modality (including firmware, resident software, microcode, etc.) or a modality that combines aspects of software and hardware that can generally be called upon. present document of a "circuit," "module" or "system." The functions described in this disclosure can be implemented as an algorithm performed by a computer microprocessor. In addition, aspects of the present invention may take the form of a program product computer embedded in one or more computer-readable media with an embedded human-readable code and program, for example, stored on it.
[009] [009] Any combination of one or more computer-readable media may be used. The computer-readable medium can be a computer-readable signal medium or a computer-readable storage medium. A computer-readable storage medium can be, for example, but without limitation, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection with one or more wires, a portable floppy disk, a hard disk, a random access memory (RAM), a memory read-only (ROM), an erasable programmable read-only memory, (EPROM or Flash memory), an optical fiber, a portable compact disc (CD-ROM) read-only memory, an optical storage device, a storage device magnetic storage, or any suitable combination of the previous items. In the context of this document, a computer-readable storage medium may be any tangible medium that may contain or store a program for use by or in connection with a system, apparatus or device for carrying out instructions.
[010] [010] A computer-readable signal medium may include a data signal propagated with a computer-readable program code embedded in it, for example, in the baseband or as part of a carrier wave. This propagated signal can take a variety of forms, including, but not limited to, electromagnetic, optical, or any suitable combination thereof. A computer-readable signal medium can be any computer-readable medium that is not a computer-readable storage medium and that can communicate, propagate or transport a program for use by or in connection with a system, device or device for executing instructions.
[011] [011] The program code embedded in a computer-readable medium may be transmitted using any appropriate medium, including, but not limited to, wireless products, wire rope, fiber optics, cable, RF, etc., or any suitable combination previous items. The computer program code to perform operations for aspects of the present invention can be written in any combination of one or more programming languages, including a functional or object-oriented programming language such as Java (TM), Scala, C ++, Python or similar and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code can be run entirely on the user's computer, partly on the user's computer, as a standalone software package, partly on the user's computer and partly on a remote computer, or entirely on the user's computer, server or server viewed. In the latter scenario, the remote computer can be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection can be made to an external computer (for example, over the internet using an Internet Service Provider).
[012] [012] Aspects of the present invention are described below with reference to flowchart illustrations and / or block diagrams of methods, apparatus (systems) and computer program products, in accordance with modalities of the invention. It should be understood that each block of flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart and / or block diagram illustrations, can be implemented by computer program instructions. These computer program instructions can be provided to a processor, in particular, a microprocessor or central processing unit (CPU), or graphics processing unit (GPU), or a general-purpose computer, special-purpose computer, or other programmable data processing apparatus to produce a machine, so that instructions, executed by means of the computer's processor, another programmable data processing apparatus, or other devices, create means to implement the functions / acts specified in the flowchart and / or block or blocks from the block diagram.
[013] [013] These computer program instructions can also be stored in a computer-readable medium that can target a computer, another programmable data processing device, or other devices to function in a particular way, so that the instructions stored in the computer-readable means produce an article of manufacture, including instructions that implement the function / act specified in the flowchart and / or block or blocks of the block diagram.
[014] [014] Computer program instructions can also be loaded onto a computer, another programmable data processing device, or other devices to cause a series of operational steps to be performed on the computer, other programmable devices or other devices to produce a process implemented by computer, so that the instructions executed on the computer or on another programmable device provide processes to implement the specific functions / acts in the flowchart and / or block or blocks of the block diagram.
[015] [015] The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of computer program systems, methods and products, in accordance with various modalities of the present invention. In this sense, each block in the flowchart or block diagram can represent a module, segment, or portion of the code, which comprises one or more executable instructions to implement the specified logical functions. It should also be noted that, in some alternative implementations, the functions indicated in the blocks may occur outside the order indicated in the figures. For example, two blocks shown in succession can, in fact, be executed substantially simultaneously, or the blocks can sometimes be executed in reverse order, depending on the functionality involved. It is also noted that each block in the block diagram and / or flowchart illustrations, and combinations of blocks in the block diagram and / or flowchart illustrations, can be implemented by hardware-based systems for special purposes that perform the specified functions or acts, or combinations of special-purpose hardware and computer instructions.
[016] [016] The present disclosure provides a system and method that implements automated techniques of classification and segmentation that does not require input or user interaction, except the entry of a stack of 3D images. The modalities can be used to reproduce targeted biological tissues, such as jaw bones, teeth and dentomaxillofacial nerves, such as the lower alveolar nerve. The system automatically separates structures and builds 3D models of targeted fabrics.
[017] [017] In one aspect, the invention relates to a computer-implemented method for processing 3D image data of a dentomaxillofacial structure. In one embodiment, the method may comprise: a computer that receives 3D input data, preferably 3D cone beam CT (CBCT) data, 3D input data that includes a first voxel representation of the dentomaxillofacial structure, a voxel that is associated with a radiation intensity value,
[018] [018] Therefore, 3D positional features define information about the position of voxels in the image volume received in relation to the dental reference plane and / or a dental reference object. This information is relevant to allow the deep neural network to automatically classify and segment a voxel presentation of a dentomaxillofacial structure. A 3D positional feature of a voxel from the first voxel representation can be formed by aggregating information (for example, position, intensity values, distances, gradients, etc.) that is based on the entire data set or a substantial part of the representation of voxel that is provided with the input of the first deep neural network. The aggregated information is processed by the position of a voxel in the first voxel representation. In this way, each voxel of the first voxel representation can be associated with a 3D positional feature, which the first deep neural network will take into account when classifying the voxel.
[019] [019] In one modality, the training set can additionally comprise one or more 3D models of parts of the dentomaxillofacial structures of the 3D image data of the training set. In one mode, at least part of one or more 3D models can be generated by optical scanning of parts of the dentomaxillofacial structures of the 3D image data of the training set. In one embodiment, one or more 3D models can be used as examination objects when training the first deep neural network.
[020] [020] 3D positional features can be determined using engineering features (manually) and / or using machine learning methods (trained), such as a 3D deep learning network configured to derive this information from the entire content of the data set 3D received or a substantial part of it.
[021] [021] In a modality, 3D positional features can define a distance, preferably a perpendicular distance, between one or more voxels in the image volume and a first dental reference plane in the image volume. In one embodiment, 3D positional features can define between one or more voxels in the image volume and a first dental reference object in the image volume. In an additional modality, the position information may include intensity values accumulated in the reference planes of the image volume, where an intensity value accumulated at one point in the reference plane includes accumulated intensity values of voxels in or near the normal working through the point on the reference plane.
[022] [022] The 3D positional resources that are extracted from the 3D image data encode information in relation to the image volume of the voxels that are provided with the input of the neural network. In particular, 3D positional features provide information that is partially or fully derived in relation to the position of each voxel within the (subsection of) 3D image data and will be evaluated by the deep neural network. 3D positional resources provide the neural network with the means to make use of information (partially) determined by the positions of voxels within the image volume to determine the probability that in a given volume, voxels can be found that are associated with certain dentomaxillofacial structures. Without this information, no larger spatial context can be available for use by the deep neural network. 3D positional features substantially improve the accuracy of the network, while at the same time, they are designed to reduce the risk of over-adjustment. The 3D positional features allow the network to acquire knowledge about voxel positions in the image volume, in relation to objects relevant to the dentomaxillofacial context, making this information available to determine the probability of finding voxels associated with the tissue of a dentomaxillofacial structure. In this way, the network is enabled to learn the best way to make use of the information provided where relevant.
[023] [023] In one embodiment, the first dental reference plane may include an axial plane in the image volume positioned at a predetermined distance from the upper and / or lower jaw as represented by the 3D image data. Therefore, the reference plane is positioned in relation to the relevant parts of dentomaxillofacial structures in the 3D image data. In one embodiment, the first dental reference plane may have a distance approximately equal to the upper and lower jaw.
[024] [024] In one embodiment, the dental reference object may include a dental arch curve approaching at least part of a dental arch as represented by 3D image data. Therefore, in this modality, a 3D positional resource can provide information regarding the position of voxels in the image volume in relation to the position of a dental arch of an object in the image volume. In one embodiment, the curve of the dental arch can be determined in an axial plane of the image volume.
[025] [025] Manually designed 3D positional features can be complemented or replaced by other 3D positional features such as being derived from machine learning methods that aggregate information from all or a substantial part of the 3D input data. This generation of resources can, for example, be carried out by a deep 3D neural network that performs a pre-segmentation in a version with reduced sampling of the entire or substantial part of the first voxel representation.
[026] [026] Therefore, in one embodiment, the pre-processing algorithm can include a second 3D deep neural network, the second deep neural network that is trained to receive a second voxel representation at its input, and to determine for each voxel of the second voxel representation, a 3D positional feature. In one embodiment, the 3D positional feature may include a measure that indicates a probability that a voxel represents the mandible, teeth and / or nerve tissues, where the second voxel representation is a low-resolution version of the first voxel representation.
[027] [027] In one embodiment, the second 3D deep neural network may have a 3D architecture in a U network. In one embodiment, the 3D U-U network may comprise a plurality of layers of 3D neural network, which includes convolutional layers (3D CNNs ), 3D maximum pool layers, 3D unconvolutional layers (3D de-CNNs), and densely connected layers.
[028] [028] In one embodiment, the resolution of the second voxel representation can be at least three times less than the resolution of the first voxel presentation.
[029] [029] In one embodiment, the second 3D deep neural network can be trained based on the 3D image data of dentomaxillofacial structures from the training set that is used to train the first deep neural network. In one embodiment, the second 3D deep neural network based on one or more 3D models of parts of the dentomaxillofacial structures of the 3D image data from the training set that is used to train the first deep neural network. During training, these one or more 3D models can be used as a target.
[030] [030] In one embodiment, providing the first voxel representation and the one or more 3D positional features associated with the first voxel representation with the input of a first 3D deep neural network can further comprise: associating each voxel of the first voxel representation with at least information defined by a 3D positional feature; dividing the first voxel representation into first blocks of voxels; providing a first block of voxels with the input of the first deep neural network in which each voxel of the first block of voxels is associated with a radiation intensity value and at least information defined by a 3D positional feature. Therefore, the first 3D deep neural network can process 3D input data based on the voxel blocks. To that end, the computer can partition the first voxel representation into a plurality of first voxel blocks and supply each of the first blocks with the input of the first 3D deep neural network.
[031] [031] In one embodiment, the first deep neural network can comprise a plurality of first 3D convolutional layers, in which the output of the plurality of first 3D convolutional layers can be connected to at least fully connected layer. In one embodiment, the plurality of first 3D convolutional layers can be configured to process a first voxel block of the first voxel representation and in which at least one fully connected layer is configured to classify voxels of the first voxel block in at least one of the jaws, teeth and / or nervous voxels.
[032] [032] In one embodiment, a voxel provided with the input of the first deep neural network can comprise a radiation intensity value and at least one 3D positional feature.
[033] [033] In one embodiment, the first deep neural network can additionally comprise a plurality of second 3D convolutional layers, wherein the output of the plurality of second 3D convolutional layers can be connected to at least one fully connected layer.
[034] [034] In one embodiment, the plurality of second convolutional 3D layers can be configured to process a second voxel block of the first voxel representation, in which the first and second voxel blocks can have the same or substantially the same central point in the image volume and where the second block of voxels can represent a volume in the dimensions of the real world greater than that in the dimensions of the real world of the first block of voxels.
[035] [035] In one embodiment, the plurality of second 3D convolutional layers can be configured to determine contextual information associated with voxels from the first blocks of voxels that are provided with the input of the plurality of first 3D convolutional layers.
[036] [036] In one embodiment, the first deep neural network can additionally comprise a plurality of third 3D convolutional layers, the output of the plurality of third 3D convolutional layers that are connected to at least one fully connected layer. The plurality of third 3D convolutional layers can be configured to process one or more 3D positional features associated with voxels of at least one first block of voxels that are provided with the input of the plurality of first 3D convolutional layers.
[037] [037] In one embodiment, the first deep neural network can be trained based on a training set, the training set that includes 3D image data of dentomaxillofacial structures, one or more 3D positional features derived from 3D image data and one or more 3D models of parts of the dentomaxillofacial structures of the training set 3D image data, where one or more 3D models can be used as a target during the training of the first deep neural network. In one embodiment, at least part of the one or more 3D models can be generated by optical scanning of parts of the dentomaxillofacial structures of the 3D image data of the training set. So, instead of manually segmented 3D image data, 3D models optically scanned are used to train the neural network, thus providing accurate, high-resolution modules that can be used as target data.
[038] [038] In one embodiment, the determination of one or more 3D positional features may include: determining a point cloud of intensity values accumulated in a plane of the image volume, preferably the plane which is an axial plane, in which a intensity value accumulated at a point in the plane can be determined by the sum of the values of voxels positioned over or within the proximity of the normal across the point in the axial plane; determining the intensity values accumulated in the plane that are above the predetermined value; and, it adjusts a curve through the determined accumulated intensity values, the curve that approximates at least part of a dental arch in the dentomaxillofacial structure represented by the 3D data image. Therefore, dental structures such as a dental arch can be determined by the sum of the intensity values of voxels positioned in a normal direction of a plane, for example, an axial plane.
[039] [039] In one embodiment, the one or more 3D positional features may include a first 3D positional feature that defines a relative distance in a plane in the image volume, preferably an axial plane in the image volume, between voxels in the plane and an origin in a dental arch curve defined in the plane. In one embodiment, the origin can be defined as a point on the dental arch curve where the derivative of the curve is zero.
[040] [040] In one embodiment, the one or more 3D positional features include a second 3D positional feature that defines a relative distance in one plane in the image volume, preferably an axial plane in the image volume, the distance that is the smallest distance in an axial plane between voxels in the axial plane and the curve of the dental arch.
[041] [041] In one embodiment, 3D positional features can be determined based on the automatic generation of features using all or a substantial part of the 3D input data. In one embodiment, the automatic generation of resources that performs a pre-segmentation in a version with reduced sampling of all or a substantial part of the 3D input data.
[042] [042] In one embodiment, the first deep neural network may comprise a first data processing path that includes at least a first set of 3D convolutional layers, preferably a first set of layers of CNN 3D resources, configured to determine progressively higher abstractions of information useful to derive voxel classification, and a second data processing path parallel to the first path the second path comprises a second set of 3D convolutional layers, preferably a second set of CNN resource layers 3D, where the second set of 3D convolutional layers can be configured to determine progressively higher abstractions of information useful for deriving voxel classification using larger spatial contextual representations of blocks of voxels that are fed with the input of the first set of convolutional layers 3D.
[043] [043] Therefore, the second set of CNN 3D resource layers can process voxels in order to generate 3D resource maps that include information about the direct proximity of associated voxels that are processed by the first layers of CNN 3D resources. In this way, the second path allows the neural network to determine contextual information, that is, information about the context (for example, surroundings) of voxels of the 3D image data that are presented for the input of the neural network. Using two paths or even two more paths, both 3D image data (the input data) and contextual information about voxels of the 3D image data can be processed in parallel. Contextual information is important for classifying dentomaxillofacial structures, which usually include compacted dental structures that are difficult to distinguish.
[044] [044] In one embodiment, the first deep neural network may additionally comprise a third data processing path that includes a third set of 3D convolutional layers, preferably a third set of CNN 3D resource layers, parallel to the first and second paths, to receive one or more 3D positional features associated with 3D image data, the third set of 3D convolutional layers that is configured to encode information relevant to aggregating information from the entire 3D received data set, associated with voxel blocks that are fed with the input of the first set of 3D convolutional layers.
[045] [045] In one embodiment, instead of using a third data processing path, 3D positional features can be added to the first voxel representation, so that it is paired with voxels from the first voxel representation, for example, through adding 3D positional feature information as additional channels to the 3D image information received in 3D.
[046] [046] In one embodiment, the output of the first, second and (optionally) third set of 3D convolutional layers can be provided with the input of a set of fully connected convolutional layers that are configured to classify at least part of the voxels in the volume of image in at least one jaw, teeth and / or nervous voxels.
[047] [047] In one embodiment, the method may additionally comprise: a third post-processing deep neural network of voxels classified by the first deep neural network, post-processing includes the correction of voxels that are incorrectly classified by the first deep neural network. In one embodiment, the second neural network can be trained using voxels that are classified during the training of the first deep neural network as input and using one or more 3D models of parts of the dentomaxillofacial structures of the 3D image data of the training set as one target. Therefore, in this modality, a second convolutional neural network can be trained to correct voxels classified by the first neural network. In this way, very accurate 3D models of individual parts of the dentomaxillofacial structure can be determined, including 3D models of teeth and jaws.
[048] [048] In one aspect, the invention may refer to a computer-implemented method for training a deep learning neural network system to process 3D image data from a dentomaxillofacial structure. In one embodiment, the method may include: a computer that receives training data, training data includes: 3D input data, preferably, 3D image data cone beam (CBCT), 3D input data that define one or more voxel representations of one or more dentomaxillofacial structures respectively, a voxel that is associated with a radiation intensity value, the voxels of a voxel representation that define an image volume; the computer that uses a pre-processing algorithm to pre-process one or more voxel representations of one or more dentomaxylofacial structures respectively, to determine one or more 3D positional features for voxels in one or more voxel representations, a positional feature 3D that defines information about a position of at least one voxel of a voxel representation of dentomaxillofacial structures in relation to the position of a dental reference plane (for example, an axial plane positioned in relation to the mandible) or the position of an object of dental reference (for example, a mandible, a dental arch and / or one or more teeth) in the image volume; and, using training data and one or more 3D positional resources to train the first deep neural network to classify voxels in the jaw, teeth and / or nervous voxels.
[049] [049] In one embodiment, the training data may additionally include: one or more 3D models of parts of the dentomaxillofacial structures represented by the 3D input data of the training data. In one embodiment, at least part of the one or more 3D models can be generated by scanning optical parts of the dentomaxillofacial structures of the 3D image data of the training data. In one embodiment, one or more 3D models can be used as a target during the training of the first deep neural network.
[050] [050] In one embodiment, the method may include: using voxels that are classified during the training of the first deep neural network and the one or more 3D models of parts of the dentomaxillofacial structures of the 3D image data of the training set to train a second neural network to post-process voxels classified by the first deep neural network, the post-processing that includes the correction of voxels that are incorrectly classified by the first deep neural network.
[051] [051] In a further aspect, the invention may refer to a computer system adapted to process 3D image data of a dentomaxillofacial structure comprising: a computer-readable storage medium, the embedded computer-readable program code having thus, the computer-readable program code that includes a pre-processing algorithm and a first deep neural network; and a processor, preferably a microprocessor, coupled to the computer-readable storage medium, in which responsive to the execution of the computer-readable program code, the processor is configured to perform executable operations that include: receiving 3D input data, from preferably, 3D cone beam CT (CBCT) data, the 3D input data includes a first voxel representation of the dentomaxillofacial structure, a voxel that is associated with a radiation intensity value, the voxels of the voxel representation that defines a image volume; a pre-processing algorithm that uses 3D input data to determine one or more 3D positional features of the dentomaxillofacial structure, a 3D positional feature that defines information about voxel positions of the first voxel representation in relation to the position of a reference plane dental, for example, an axial plane positioned in relation to the mandible, or the position of a dental reference object, for example, a mandible, a dental arch and / or one or more teeth, in the image volume; provide the first voxel representation and the one or more 3D positional features associated with the first voxel representation with the input of a first 3D deep neural network, preferably a 3D convolutional deep neural network, the first deep neural network that is configured to classify voxels of the first voxel representation in at least jaw, teeth, and / or nervous voxels; the first neural network is trained on the basis of a training set, the training set includes 3D image data of dentomaxillofacial structures and one or more 3D positional features derived from the 3D image data of the training set; and, receiving classified voxels from the first voxel representation of the output of the first 3D deep neural network and determining a voxel representation of at least one of the jaws, teeth and / or nervous tissues of the dentomaxillofacial structure based on the classified voxels.
[052] [052] In one embodiment, the training set may additionally comprise one or more 3D models of parts of the dentomaxillofacial structures of the 3D image data from the training set. In one embodiment, at least part of the one or more 3D models may be generated by parts optically subjected to scanning the dentomaxillofacial structures of the 3D image data of the training set. In one embodiment, one or more 3D models can be used as a target during the first deep neural network.
[053] [053] In one embodiment, the preprocessing algorithm can include a second 3D deep neural network, the second deep neural network that is trained to receive a second voxel representation at its input, and, to determine for each voxel of the second voxel representation, a 3D positional feature, preferably the 3D positional feature includes a measurement indicating a probability that a voxel represents jaw, teeth and / or nervous tissue, where the second voxel representation is a low resolution version of first voxel representation, preferably the resolution of the second voxel representation which is at least three times less than the resolution of the first voxel presentation, preferably the second deep 3D neural network that is trained based on the 3D image data of dentomaxillofacial structures and the one or more 3D models of parts of the dentomaxillofacial structures of the 3D image data of the training set of the training set for train the first deep neural network.
[054] [054] In one embodiment, the first deep neural network may comprise: a plurality of first 3D convolutional layers, the output of the plurality of first 3D convolutional layers that are connected to at least one fully connected layer, in which the plurality of first layers 3D convolutionals are configured to process a first voxel block of the first voxel representation and in which at least one fully connected layer is configured to classify voxels of the first voxel block in at least one of the jaws, teeth and / or nerve voxels, preferably, each voxel provided with the input of the first deep neural network comprises a radiation intensity value and at least one 3D positional feature.
[055] [055] In one embodiment, the first deep neural network may additionally comprise: a plurality of second 3D convolutional layers, the output of the plurality of second 3D convolutional layers that are connected to at least one fully connected layer, in which the plurality of second 3D convolutional layers are configured to process a second voxel block of the first voxel representation, the first and second voxel blocks being that they have the same or substantially the same central point in the image volume and the second voxel block represents a volume in the real-world dimensions that are larger than the volume in the real-world dimensions of the first block of voxels, the plurality of second 3D convolutional layers that are configured to determine contextual information associated with the voxels of the first block of voxels that are provided with the entry of the plurality of first 3D convolutional layers.
[056] [056] The invention can also refer to a computer program product that comprises pieces of software code configured to, when executed in a computer's memory, perform any of the methods as described above.
[057] [057] The invention will be further illustrated with reference to the attached drawings, which will show schematically modalities, according to the invention. It should be understood that the invention is in no way restricted to these specific modalities.
[058] [058] In this disclosure, modalities are described for computer systems and computer-implemented methods that use deep neural networks to classify, segment and 3D model dentomaxillofacial structures based on 3D image data, for example, 3D image data defined by a sequence of images that form a stack of CT image data, in particular a stack of cone beam CT (CBCT) data. The 3D image data can comprise voxels that form a 3D image space of a dentomaxillofacial structure. A computer system according to the invention can comprise at least one deep neural network that is trained to classify a 3D image data stack of a dentomaxillofacial structure into voxels of different groups, where each class can be associated with a part distinct (for example, teeth, jaw, nerve) from the structure. The computer system can be configured to perform a training process that iteratively trains (optimizes) one or more deep neural networks based on one or more training sets that can include accurate 3D models of dentomaxillofacial structures. These 3D models can include dentomaxillofacial structures optically scanned (teeth and / or jaw).
[059] [059] Once trained, the deep neural network can receive a 3D image data stack from a dentomaxillofacial structure and classify the voxels of the 3D image data stack. Before, the data presented to the trained deep neural network, the data can be pre-processed so that the neural network can classify voxels efficiently and accurately. The output of the neural network can include different collections of voxel data, where each collection can represent a distinct part, for example, teeth or jaw of 3D image data. Classified voxels can be post-processed in order to reconstruct an accurate 3D model of the dentomaxillofacial structure.
[060] [060] The computer system comprising a neural network trained to classify voxels automatically from dentomaxillofacial structures, network training, pre-processing of 3D image data before being fed to the neural network as well as post-processing of voxels that are classified by the neural network are described according to this document in more detail.
[061] [061] Figure 1 schematically represents a computer system for classification and segmentation of 3D dentomaxillofacial structures, according to an embodiment of the invention. In particular, computer system 102 can be configured to receive a stack of 3D image data 104 from a dentomaxillofacial structure. The structure can include jaw-, teeth- and nervous structures. The 3D image data can comprise voxels, that is, 3D space elements associated with a voxel value, for example, a grayscale value or a color value, which represents a radiation intensity or density value. Preferably, the 3D image data stack may include CBCT image data, according to a predetermined format, for example, the DICOM format or a derivative thereof.
[062] [062] The computer system may comprise a preprocessor 106 to preprocess 3D image data before being fed with the input of a first 3D deep learning neural network 112, which is trained to produce a 3D set of voxels classified as an output 114. As will be described according to this document in more detail, the 3D deep learning neural network can be trained, according to a predetermined training scheme so that the trained neural network is able to precisely classify voxels in the 3D image data stacks in voxels of different groups (for example, voxels associated with teeth-, jaw and / or nervous tissue). The 3D deep learning neural network can comprise a plurality of layers of 3D connected convolutional neural networks (3D CNN).
[063] [063] The computer system can additionally comprise a postprocessor 116 to precisely reconstruct 3D models of different parts of the dentomaxillofacial structure (eg, tooth, jaw and nerve) using the voxels classified by the 3D deep learning neural network. As will be described according to this document in greater detail, part of the classified voxels, for example, voxels that are classified as belonging to a dental structure or a jaw structure are inserted into a second additional 3D deep learning neural network 120, which they are trained to reconstruct 3D volumes for the dentomaxillofacial structures, for example, the shape of the jaw 124 and the shape of the teeth 126, based on the voxels that have been classified to belong to these structures. Other parts of voxels classified, for example, voxels that have been classified by the 3D deep neural network as belonging to the nerves can be post-processed using an interpolation function 118 and stored as 3D nerve data 122. The task of determining the volume it represents a nerve of the classified voxels is of a nature that is currently beyond the capacity of (the processing energy available to) a deep neural network. Furthermore, the classified voxels presented should not contain the information that would be suitable for a neural network to solve this particular problem. Therefore, in order to accurately and efficiently post classified nerve voxels, an interpolation of the classified voxels is used. After the post-processing of the 3D data from the various parts of the dentomaxillofacial structure, the nerve, jaw and tooth data 122 to 126 can be combined and formatted in separate 3D models 128 that precisely represent the dentomaxillofacial structures in the 3D image data that were fed for input from the computer system.
[064] [064] In CBCT the radio density (measured in Hounsfield Units (HU)) is inaccurate because different areas in the scan appear with different gray scale values depending on their relative positions in the organ that is being scanned. HU measured from the same anatomical area with both medium-grade CBCT and CT scanners are not identical and, therefore, are not reliable for determining radiographic and site-specific bone density.
[065] [065] In addition, dental CBCT systems do not employ a standardized system for scaling the gray levels that represent the reconstructed density values. These values are arbitrary and do not allow the evaluation of bone quality. In the absence of such standardization, it is difficult to interpret the gray levels or impossible to compare the values resulting from different machines.
[066] [066] The teeth and jaw structure have similar density so that it is difficult for a computer to distinguish between voxels belonging to teeth and voxel belonging to a jaw. Additionally, CBCT systems are very sensitive to the so-called beam hardening, which produces dark spots between two high-attenuation objects (such as metal or bone), with bright spots around them.
[067] [067] In order to make the 3D deep learning neural network robust against the problems mentioned above, the 3D neural network can be trained using a 138 module to make use of 3D models of parts of the dentomaxillofacial structure as represented by the 3D image data . The 3D training data 130 can be correctly aligned with a CBCT image displayed at 104 for which the associated output target is known (for example, CT of 3D image data of a dentomaxillofacial structure and an associated 3D segmented representation of the dentomaxillofacial structure) . Conventional 3D training data can be obtained manually by segmenting the input data, which can represent a significant amount of work. In addition, manual segmentation results in low reproducibility and consistency of the input data to be used.
[068] [068] In order to contain this problem, in an embodiment, optically produced training data 130, that is, accurate 3D models of (parts of) the dentomaxillofacial structure can be used instead of or at least in addition to the manually segmented training data . Dentomaxillofacial structures that are used to produce the training data can be scanned using a 3D optical scanner. These 3D optical scanners are known in the art and can be used to produce high quality 3D data on the surface of the teeth and jaw. 3D surface data can include 3D surface meshes 132 that can be filled (determining which specific voxels are part of the volume covered by the mesh) and used by a voxel classifier 134. In this way, the voxel classifier is capable of generating high rated voxels training quality 136. Additionally, as mentioned above, manually classified training voxels can be used by the training module to train the network as well. The training module can use the training voxels classified as a target and associated CT training data as input.
[069] [069] Additionally, during the training process, CT training data can be pre-processed by a resource extractor 108, which can be configured to determine 3D positional resources. A dentomaxillofacial resource can encode at least spatial information associated with one or more parts of the dentomaxillofacial image structure (the received 3D data set). For example, in a modality, a 3D positional feature designed manually may include a 3D curve that represents (part of) the jaw, in particular, the dental arch, in the 3D volume containing the voxels. One or more weight parameters can be assigned to points along the 3D curve. the value of a weight can be used to encode a translation in 3D space from a voxel to a voxel. Instead of incorporating, for example, a coded version of the original space, the image stack is received in the coded space that is specific to the dentomaxillofacial structures as detected at the entrance. The resource extractor can determine one or more curves by approaching one or more curves of the jaw and / or teeth (for example, the dental arch) by examining the voxel values that represent radiation intensity or density values and adjust one or more curves ( for example, a polynomial) through certain voxels. Derivatives of (parts of) dental arch curves from a stack of 3D CT image data can be stored as a positional feature mapping 110.
[070] [070] In another embodiment, these 3D positional features can, for example, be determined by means of a (trained) machine learning method as a 3D deep neural network designed to derive relevant information from the entire 3D data set received.
[071] [071] Figure 2 represents a flow diagram of a deep neural network in training to classify dentomaxillofacial 3D image data, according to an embodiment of the invention. Training data is used to train a 3D deep learning neural network so that it is able to automatically classify voxels from a 3D CT scan of a dentomaxillofacial structure. As shown in this figure, a representation of a dentomaxillofacial complex 202 can be provided for the computer system. The training data may include a stack of TC 204 image data from a dentomaxillofacial structure and an associated 3D model, for example, 3D 206 data from scanning the same dentomaxillofacial structure. Examples of such 3D CT image data and optical 3D data scanning are shown in Figure 3A and 3B. Figure 3A represents DICOM slices associated with different planes of a 3D CT scan of a dentomaxillofacial structure, for example, an axial plane 302, a frontal or coronal plane 304 and sagittal plane 306. Figure 3B represents optical scan data 3D of a dentomaxillofacial structure. The computer can form 3D surface meshes 208 of the dentomaxillofacial structure based on the optical scanning data. In addition, an alignment function 210 can be employed that is configured to align the surface meshes to the 3D CT image data. after alignment, the representations of 3D structures that are provided with the computer input, use the same spatial coordinate system. Based on the aligned CT image data and positional features of 3D 212 surface meshes and classified voxel data from the 3D 214 model optically scanned can be determined. The positional features and classified voxel data can then be provided with the input of the deep neural network 216, along with the image stack 204.
[072] [072] Therefore, during the training phase, the 3D deep learning neural network receives 3D CT training data and positional resources extracted from 3D CT training data as input data and classified training voxels with 3D CT training data are used as target data. An optimization method can be used to learn the ideal values of the deep neural network parameters by minimizing a loss function that represents the deviation of the deep neural network output to the target data (ie, classified voxel data), which represent the desired output for a predetermined input. When the minimization of the loss function converges to a certain value, the training process can be considered suitable for application.
[073] [073] The training process depicted in Figure 2 using 3D positional features in combination with training voxels, which can be (at least partially) derived from 3D optically scanned data, provides a high quality training set for the 3D deep learning neural network. After the training process, the trained network is able to precisely classify voxels from a stack of 3D CT image data.
[074] [074] Figures 4A and 4B represent high-level schemes of deep neural network architecture for use in the methods and systems described in this disclosure. Deep neural networks can be implemented using one or more convolutional 3D neural networks (3D CNNs). Convolutional layers can employ an activation function associated with neurons in the layers such as a sigmoid function, tan function, relu function, softmax function, etc. A deep neural network can include a plurality of 3D convolutional layers in which small variations in the number of layers and their defining parameters, for example, different activation functions, nucleus quantities and sizes, and additional functional layers such as abandonment normalization layers and batch, can be used in the implementation without losing essence of the design of the deep neural network.
[075] [075] As shown in Figure 4A, the network can include a plurality of convolutional paths in which each convolutional path is associated with a set of 3D convolutional layers. In one embodiment, the network can include at least two convolutional paths, a first convolutional path associated with a first set of 3D 406 convolutional layers and a second convolutional path associated with a second set of 3D 408 convolutional layers. The first and second convolutional paths they can be trained to encode 3D resources derived from received 3D image data associated with the voxels that are offered with the entry of the first and second convolutional paths, respectively. In addition, in some embodiments, the network may include at least one (third) additional convolutional path associated with a third set of 3D 407 convolutional layers. The third convolutional path can be trained to encode 3D features derived from 3D positional feature data. received with voxels that are offered with the entry of a third path.
[076] [076] Alternatively, in a modality, instead of an additional convolutional path that is trained based on 3D positional feature data, 3D positional feature data can be associated with the voxel intensity values that are offered with the entry the first and second convolutional paths. Therefore, in this modality, the first and second convolutional paths can be trained based on training data that includes a 3D data stack of voxel values including intensity values and positional resource information.
[077] [077] The function of different paths is illustrated in more detail in Figure 4B. As shown in this figure, voxels are fed with input from the neural network. These voxels are associated with a predetermined volume, which can be referred to as the 4013 image volume. The total voxel volume can be divided into first blocks of voxels and the 3D convolutional layers of the first 4031 path can perform a 3D convolutional operation on each one of the first 4011 voxel blocks of 3D image data. During the process, the output of each 3D convolutional layer can be the input of a subsequent 3D convolutional layer. In this way, each 3D convolutional layer can generate a 3D resource map that represents resources from the 3D image data that are fed with the input. A 3D convolutional layer that is configured to generate these resource maps can therefore be referred to as a 3D CNN resource layer.
[078] [078] As shown in Figure 4B, the convolutional layers of the second convolutional path 4032 can be configured to process second 4012 voxel blocks of 3D image data. Each second block of voxels is associated with a first block of voxels, where the first and second blocks of voxels have the same origin centered on the image volume. The volume of the second block is greater than the volume of the first block. In addition, the second block of voxels represents a reduced sampling version of a first block of associated voxels. Reduced sampling can be based on the use of a known interpolation algorithm. The reduced sampling factor can be any appropriate value. In one embodiment, the reduced sampling factor can be selected between 20 and 2, preferably between 10 and 3.
[079] [079] Therefore, the 3D deep neural network can comprise at least two convolutional paths. A first 4031 convolutional path can define a first set of layers of CNN 3D resources (for example, layers 5 to 20), which are configured to process input data (for example, first blocks of voxels at predetermined positions in the image volume ) in a first voxel resolution, for example, the target voxel resolution (that is, the voxel resolution of the 3D image data to be classified). Likewise, a second convolutional path can define a second set of layers of CNN 3D resources (for example, layers 5 to 20), which are configured to process input data in a second voxel resolution (for example, second blocks voxels where each block of the second 4012 voxel blocks has the same center point as the associated block of the first 4011 voxel block). Here, the second resolution is less than the first resolution. Therefore, the second blocks of voxels represent a greater volume in the dimensions of the real world than the first blocks. In this way, the second layers of CNN 3D resources process voxels in order to generate 3D resource maps that include information about the (direct) proximity of associated voxels that are processed by the first layers of CNN 3D resources.
[080] [080] The second path then allows the neural network to determine contextual information, that is, information about the context (for example, its surroundings) of voxels of the 3D image data that are presented with the input of the neural network. Using multiple convolutional (parallel) paths, both the 3D image data (the input data) and the contextual information about voxels of the 3D image data can be processed in parallel. Contextual information is useful for classifying dentomaxillofacial structures, which usually include well-compacted dental structures that are difficult to distinguish, especially in the case of CBCT image data.
[081] [081] In one embodiment, the 4B neural network may additionally include a third 4033 convolutional path from a third set of 3D convolutional layers that are trained to process specific representations of 3D 404 positional features that can be extracted from 3D image data. extraction of 3D positional resources from 3D image data can be performed with a pre-processing step. In an alternative modality, instead of using a third convolutional pathway to process 3D positional features, 3D positional information, which includes 3D positional features, can be associated with the 3D image data that is offered with the input of the deep neural network. In particular, a 3D data stack can be formed in which each voxel is associated with an intensity value and positional information. Thus, positional information can be combined by applicable received voxel, for example, by adding the information of 3D positional resources, as additional channels for the received 3D information. Therefore, in this modality, a voxel of a voxel representation of a 3D dentomaxillofacial structure at the entrance of the deep neural network can not only be associated with a voxel value that represents, for example, a radio intensity value, but also with 3D positional information. Thus, in this modality, during the training of the convolutional layers of both, first and second convolutional paths, information derived from both 3D image resources and 3D positional resources can be encoded in these convolutional layers.
[082] [082] The output of the CNN 3D resource layer sets are then merged and fed with the input of a set of fully connected CNN 3D layers 410, which are trained to derive the intended classification of 412 voxels that are offered with the input from the neural network and processed by the CNN 3D resource layers.
[083] [083] CNN 3D resource layer sets are trained (through their learnable parameters) to optimally derive and transmit useful information that can be determined from their specific inputs, fully connected layers encode parameters that will determine the way in which information from the previous paths should be combined to provide ideally classified voxels 412. After that, classified voxels can be presented in the 414 image space. Therefore, the neural network outputs are voxels classified in an image space that corresponds to the image space of the voxels at the entrance.
[084] [084] Here, the output of the fully connected layers (the last layer) can provide a plurality of activations for each voxel. This voxel activation can represent a probability measure (a prediction) that defines the probability that a voxel belongs to one of a plurality of groups, for example, groups of dental structures, for example, a tooth, jaw and / or structure nervous. For each voxel, voxel activations associated with different dental structures can be thresholds in order to obtain a classified voxel.
[085] [085] Figures 5 to 7 illustrate methods of determining 3D positional features in a 3D image data stack that represent a 3D dentomaxillofacial structure and examples of such positional features. Specifically, in the case of manual engineering resources, as described with reference to Figure 1, both 3D image data stacks and associated 3D positional resources are offered as input to the 3D deep neural network, so that the network can classify precisely the voxels without the risk of over adjustment. A conversion based on the dimensions of the real world, guarantees comparable input, regardless of the resolution of the input image.
[086] [086] A 3D positional engineered resource can provide the 3D deep neural network with information about voxel positions in the image volume relative to a reference plane or a reference object in the image volume. For example, in one embodiment, a reference plane may be an axial plane in the image volume that separates voxels associated with the upper jaw and voxels with the lower jaw. In another embodiment, a reference object can include a curve, for example, a 3D curve, bringing at least part of a dental arch of teeth to the 3D image data of the dentomaxillofacial structure. In this way, positional resources provide the first deep neural network with the means to encode abstractions that indicate a probability per jaw, teeth and / or nervous tissues associated with voxel at different positions in the image volume. These positional features can help the deep neural network efficiently and accurately classify voxels from a stack of 3D image data and are designed to reduce the risk of overfitting.
[087] [087] In order to determine reference planes and / or reference objects in the image volume that are useful in the classification process, the resource analysis function can determine voxels of a predetermined intensity value or above or below a value of predetermined intensity. For example, voxels associated with bright intensity values can refer to teeth and / or mandibular tissues. In this way, information about the position of the teeth and / or jaw and the orientation (for example, a rotational angle) in the image volume can be determined by the computer. If the feature analysis function determines that the angle of rotation is greater than the predetermined amount (for example, greater than 15 degrees), the function can correct the angle of rotation to zero so it is more beneficial for accurate results.
[088] [088] Figure 5A illustrates an example of a flow diagram 502 of a method of determining 3D positional features of hand-engineered 3D image data 504, for example, a stack of 3D CT image data. this process may include determining one or more 3D positional features of the dentomaxillofacial structure, in which one or more 3D positional features that are configured with the input of the 3D deep neural network (as discussed with reference to Figure 4B above). A hand-engineered 3D positional feature defines position information for voxels in the image volume in relation to reference planes or reference objects in the image volume, for example, a distance, for example, a perpendicular distance, between voxels in the image volume. image and a reference plane in the image volume that separates the upper and lower jaws. It also defines the distance between voxels in the image volume and a dental reference object, for example, a dental arch in the image volume. You can additionally define positions of the accumulated intensity values in a second reference plane of the image volume, an accumulated intensity value at a point in the second reference plane that includes accumulated intensity values of voxels at or in the vicinity of normal operation via the point on the reference plane. Examples of 3D positional features are described according to this document.
[089] [089] In order to determine a reference object that provides positional information from the dental arch in the 3D image data of the dentomaxillofacial structure. An adjustment algorithm can be used to determine a curve, for example, a curve that follows a polynomial formula, which adjusts predetermined points in a cloud of points of different intensity values (accumulated).
[090] [090] In one embodiment, a point cloud of intensity values in an axial plane (an xy plane) of the image volume can be determined. An accumulated intensity value of a point in that axial plane can be determined by the sum of the values of voxels positioned in the normal that works through a point in the axial plane. Then, the intensity values obtained in the axial plane can be used to find a curve that approximates a dental arch to the teeth.
[091] [091] Figure 5B represents an example of a machine learning method as it can be used to generate relevant 3D positional resources (non-manual engineering), according to an embodiment of the invention. In particular, Figure 5B represents an exemplary 3D deep neural network architecture as it can be trained to generate desired resources to be processed by 3D neural network segmentation. After training, this trained model can be used analogous to method 502 as a preprocessor that derives relevant 3D positional resources based on the entire 3D data set received.
[092] [092] As with 3D positional features of manual engineering, the goal is to incorporate information into 3D positional features considering the entire 3D data set received (or at least a substantial part of it) for use in the 3D segmentation deep learning network that it is potentially relevant to the task of automatic classification and segmentation, and otherwise, it may not be available otherwise in the set or subsamples offered to the 3D segmentation deep learning network. Again, as in the 3D positional features of manual engineering, this information must be made available by voxel in the received 3D data set.
[093] [093] One of the possible ways to implement this machine learning method to automatically generate 3D positional resources is a trained deep neural network. This network can be trained to derive 3D positional resources based on a 3D input data set (for example, a voxel representation of a dentomaxillofacial structure) that is offered with the input of the segmented 3D deep neural network. In one embodiment, the preprocessing of the deep neural network can be a 3D deep U neural network as illustrated by Figure 5B. Due to the available processing limits (mainly memory requirements), this architecture would not operate on the resolutions of the received voxel representations. Therefore, a first set of 3D input data, a first voxel representation of a first resolution (for example, 0.2x0.2x0.2 mm per voxel) can be reduced sampling to a second voxel representation of a lower second resolution , for example, a 1x1x1mm resolution per voxel, using an interpolation algorithm. Thereafter, a 3D deep neural network that is trained based on the voxel representations of the second resolution can input 3D positional resource voxel information by input. An interpolation algorithm can be used to scale this information up to the first original resolution. In this way, the resulting 3D positional features coincide (spatially) with the voxels of the first voxel representation providing relevant information for each voxel of the first 3D input data set taking into account the information, considering (an aggregated version of) the whole set 3D data received.
[094] [094] This preprocessing of the 3D deep neural network can be trained to approximate desired target values (which are the desired 3D positional features). In this specific example, the targets can, for example, be a group indication per voxel in the resolution in which the pre-processing of the 3D deep neural network operates. These group indications may, for example, originate from the same set of classified training voxels 136, but with reduced sampling in the same way as the received 3D data set.
[095] [095] Note that such an exemplary implementation of a pre-processing machine learning method could effectively be considered a pre-
[096] [096] The preprocessing network can be implemented using a variety of 3D neural network layers, such as convolutional layers (3D CNNs), 3D maximum pool layers, 3D unconvolutional layers (3D de-CNNs), and densely layered disconnected. These layers can use a variety of activation functions such as linear, tan., ReLU, PreLU, sigmoid, etc. The 3D CNN and de-CNN layers can vary in their number of filters, filter sizes and subsampling parameters. The 3D CNN and de-CNN layers, as well as the densely connected layers, may vary in their parameter initialization methods. Abandonment and / or batch normalization layers can be used throughout the architecture.
[097] [097] Following a 3D U architecture, while training the various filters in the 3D CNN and 3D de-CNN layers, they learn to code significant resources, which would help the forecasting precision effort. During training, combining 3D image data sets 522 and encoding combined 3D positional features 560 are used to optimize the prediction of the latter from the former. A loss of function can be used as a measure to be minimized. This optimization effort can be aided by the use of optimizers like SGD, Adam, etc.
[098] [098] This architecture can employ multiple internal resolution scales, effectively reducing 526, 530, 534 as a result of a previous set of 3D CNN layers 524, 528, 532 through, for example, maximum pooling or subsampled 3D convolutions. The term ‘significant resources’ here refers to (successive) derivations of relevant information to determine the target output values, and is also encoded through de-CNN 3D layers, which effectively increase the resolution while employing the filters. By combining 540, 546, 552 data resulting from these CNN 3D layers 538, 544, 554 with data from the 'last' CNN 3D layers operating at the same resolution (532 to 540, 528 to 546 and 524 to 552), predictions highly accurate can be achieved. Throughout the path of increasing resolution, additional 3D CNN layers can be used 542, 548, 554.
[099] [099] When used for interference, having been trained to code internal parameters so that validation produces sufficiently accurate results, an input sample can be presented and the 3D deep learning network can provide predicted 3D positional features 542.
[0100] [0100] An example of a reference object for use in determining 3D positional features of manual engineering, in this case, the curve that approximates a dental arch, is provided in Figure 6. In this example, a point cloud in the axial plane ( xy) indicates areas of high intensity values (bright white areas) can indicate areas of tooth or jaw structures. In order to determine a dental arch curve, the computer can determine areas on an axial plane of the image volume associated with bright voxels (for example, voxels that have an intensity value above a predetermined threshold value) that can be identified as voxels of teeth or jaw. These high-intensity areas can be used to determine an increasing array of bright areas that approach the dentomaxillofacial arch. In this way, a curve of the dental arch can be determined, which approximates an average of the dentomaxillofacial arches of the upper and lower jaws, respectively. In another embodiment, the separate curves of the dental arch associated with the upper and lower jaw can be determined.
[0101] [0101] Figures 7A to 7E represent examples of 3D positional features of 3D image data, according to various embodiments of the invention.
[0102] [0102] Figure 7A represents an image (left) of a slice of a sagittal plane from a stack of 3D image data and an associated view (right) of a so-called height feature of the same slice. This height feature can encode a z position (a height 704) of each voxel in the image volume of the 3D CT image data stack in relation to the 702 reference plane. The reference plane (for example, the axial plane or xy which is determined to be (the best approximation of) the xy plane with approximately equal distance to both the upper and lower jaws and their constituent teeth.
[0103] [0103] Other 3D positional features can be defined to encode spatial information in an xy space of a stack of 3D image data. In one embodiment, this positional feature can be based on a curve that approaches (part of) the dental arch. This positional feature is illustrated in Figure 7B, which represents a slice (left) of a stack of 3D image data and a view (right) of the so-called travel feature for the same slice. This travel feature is based on the curve that approaches dental arch 706 and defines the relative distance 708 measured along the curve. Here, zero distance can be like point 710 where the curve derived from the second degree polynomial is (approximately) zero. The distance traveled increases when moving in any direction on the x axis, from that point (for example, the point where the derivative is zero).
[0104] [0104] An additional 3D positional feature based on the dental arch curve can define the shortest (perpendicular) distance from each voxel in the image volume to the dental arch curve 706. This positional feature can therefore be referred to as a 'feature from distance'. An example of this feature is provided in Figure 7C, which represents a slice (left) of the 3D image data stack and a view (right) of the distance feature for the same slice. For this feature, zero distance means that the voxel is positioned on the curve of the dental arch curve 708.
[0105] [0105] Yet another additional 3D positional feature can define positional information for individual teeth. An example of this feature (which can also be called a dental feature) is provided in Figure 7D, which represents a slice (left) of the 3D image data stack and a preview (right) of the dental feature of the same slice. The dental resource can provide information to be used to determine the probability of finding voxels of certain teeth at a given position in the voxel space. This feature can, following a given reference plane like 702, encode a separate sum of voxels along the normal for any plane (for example, the xy plane or any other plane). This information thus provides the neural network with a 'visualization' of all the information from the original space added up along the normal plane. This view is larger than it would be when excluding this feature and can provide a means of differentiating whether a rigid structure is present based on all the information in the chosen direction of space (as illustrated in 7121.2 for the xy plane).
[0106] [0106] Figure 7E shows a visualization of 3D positional features that can be generated by a machine learning preprocessor, in particular, a 3D deep neural network as described in relation to Figure 5B. These 3D positional features were rendered on a computer and the 3D volumes shown are the result of the threshold of the predicted values. From the relative 'roughness' of the surfaces that define the volumes, it can be noted that this network and its input and target data operated at a lower 3D resolution than that of the final voxel representation to be segmented (In the case of this example, a 1x1x1mm resolution per voxel was used). As targets, the same training data should be used for the segmentation 3D deep learning network, but with reduced sampling for an applicable resolution that adheres to the processing requirements for use by a preprocessing 3D deep neural network. This actually leads to these 3D positional features that contain a 'gross' pre-segmentation of, in the case of this example, jaw 720, tooth 722 and nerve structures 724. For the purposes of this illustration, the lower jaw of that particular patient it was not processed to show the voxels classified as being more likely and to be part of the nervous structure.
[0107] [0107] This coarse pre-segmentation can be adequately sampled, for example, by means of interpolation, ensuring that by voxel in the desired segmented resolution (which is the resolution originally received), the information of this pre-segmentation coincides spatially in the desired resolution. For example, information from a voxel in the view shown, can spatially match 5x5x5 voxels at the desired resolution, and that information must be combined with all 125 voxels applicable at the desired resolution. Subsequently, this sampled information can be presented as, or included in, a set of 3D positional resources and, as described in relation to Figure 4, be fed into the segmentation of 3D deep neural network as input.
[0108] [0108] Therefore, Figures 5 to 7 show that a 3D positional resource defines information about voxels of a voxel representation that are provided with the input of a deep neural network that is trained to classify voxels. The information can be aggregated from all (or a substantial part of) information available from the voxel representation in which, during aggregation, the position in relation to a dental reference object can be taken into account. In addition, the information that is aggregated so that it can be processed by the position of one in the first voxel representation.
[0109] [0109] Figures 8A to 8D represent examples of the output of a trained deep neural network, according to an embodiment of the invention. In particular, Figures 8A to 8D represent 3D images of voxels that are classified using a deep learning neural network that is trained using a training method as described with reference to Figure 2. As shown in Figures 8B and 8C, they can be classified through the neural network in the voxels belonging to the structure of the teeth (Figure 8B), structure of the mandible (Figure 8C) or nervous structures (Figure 8D). Figure 8A represents a 3D image that includes the voxels that the deep learning neural network has classified as teeth, jaw and nervous tissue. As shown by Figures 8B to 8D, the classification process is accurate, but there are still many voxels missing or misclassified. For example, as shown in Figures 8B and 8C, voxels that may be part of the jaw structure are classified as teeth voxels, while on surfaces belonging to the roots of the teeth voxels are lost. As shown in Figure 8D, this problem is even more pronounced with classified nerve voxels.
[0110] [0110] In order to solve the problem of outliers in classified voxles (which form the output of the first deep learning neural network), voxels can be post-processed. Figure 9 represents a post-processing flow diagram of classified voxels of 3D dentomaxillofacial structures, according to an embodiment of the invention. In particular, Figure 9 represents a flow diagram of voxel data of dentomaxillofacial structures that are classified using a deep learning neural network as described with reference to Figures 1 to 8 of that application.
[0111] [0111] As shown in Figure 9 the process may include a step of dividing the voxel data classified 902 from a dentomaxillofacial structure into voxels that are classified as jaw voxels 904, voxels of teeth 906 and voxels that are classified as nerve data 908 As will be described according to this document in more detail, postprocessed jaw and tooth voxles using a second additional deep learning neural network
[0112] [0112] The post-processing of the deep learning neural network encodes representations of both teeth and jaw. During post-processing training of the deep learning neural network, the parameters of the neural network are adjusted so that the output of the first deep learning neural network is translated into the most viable 3D representation of these dentomaxillofacial structures. In this way, imperfections in the classified voxles can be reconstructed 912. Additionally, the surface of the 3D structures can be smoothed 914 so that the best possible 3D jaw and teeth model can be generated. Omit the 3D CT image data stack from being a source of information for the coarse post-processing step against unwanted variations within the image stack.
[0113] [0113] Due to the nature of the CT (CB) images, the output of the first deep learning neural network will suffer from (previously mentioned) potential artifacts as an average due to the patient's movement, beam hardening, etc. another source of noise is variation in the image data captured by different CT images. This variation results in the introduction of several factors, such as varying amounts of noise in the image stack, variable values of voxel intensity representing the same density (real world), and potentially others. The effects that the aforementioned artifacts and noise have on the output of the first deep learning neural network, can be removed or at least substantially reduced, by post-processing the deep learning neural network, leading to 918 segmented jaw voxels and 920 segmented teeth.
[0114] [0114] Nerve data classified 908 can be post-processed separately from the jaw and teeth data. The nature of nerve data, which represents thin filament structures in the CT image data stack, makes this data less suitable for post-processing by a deep learning neural network. Instead, classified nerve data is post-processed using an interpolation algorithm for the 916 segmented nerve data procedure. To that end, voxels that are classified as nerve voxels and that are associated with a high probability (for example, a probability of 95% or more) are used by adjusting the algorithm to build a 3D model of nerve structures. Thereafter, the 3D models of the jaw,
[0115] [0115] Figure 10 represents an example of an architecture of a deep learning neural network that is configured to post-process voxels classified from a 3D dentomaxillofacial structure, according to an embodiment of the invention. The post-processing of the deep learning neural network can have an architecture that is similar to the first deep learning neural network, including a first path formed by the first set of CNN 3D 1004 resource layers, which is configured to process data from input (in this case, a portion of the classified voxel data) to the target resolution. The deep learning neural network additionally includes a second set of CNN 3D 1006 resource layers, which is configured to process the context of the input data that is processed by the first CNN 3D resource layers, but with a lower resolution than with the target. The output of the first and second layers of CNN 3D resources are then fed with the input of a set of fully connected layers CNN 3D 1008 in order to reconstruct the classified voxel data, so that they closely represent a 3D model of the 3D dentomaxillofacial structure. The output of the fully connected CNN 3D layers provides the reconstructed voxel data.
[0116] [0116] Neural network post-processing can be trained using the same targets as the first deep learning neural network, which represents the same desired output. During training, the network is applied as widely as possible, providing noise to the inputs to represent exceptional cases to be regularized. Inherent in the post-processing nature of the deep learning neural network, the processing performed also results in the removal of non-variable aspects of the received voxel data. The factors here include the smoothing and filling of the desired dentomaxillofacial structures, and the permanent removal of non-variable voxel data.
[0117] [0117] Figures 11A and 11B represent an iteration of the post-processing network, resulting in the reconstruction of the surface of classified voxels, according to an embodiment of the invention. In particular, Figure 11A represents an image of voxels classified as tooth structures, in which the voxels are the output of the first deep learning neural network. As shown in the figure, noise and other artifacts in the input data result in irregularities and artifacts in the voxel classification and, therefore, 3D surface structures that include gaps in the voxel sets that represent a dental structure. These irregularities and artifacts are especially visible in the structure of the lower alveolar nerve 11021, and in the structures of the dental root 11041 of the teeth, that is, in areas where the neural network of deep learning has to distinguish between voxels of teeth and voxels that are part of of the jaw.
[0118] [0118] Figure 11B represents the post-processing result, according to the process as described with reference to Figures 9 and 10. As shown in this figure, the post-processing of the deep learning neural network successfully removes artifacts present in input data (the classified voxels). The post-processing step successfully reconstructs parts that have been substantially affected by irregularities and artifacts, such as the root structures of the 11041 teeth that now display smooth surfaces that provide an accurate 3D model of the individual structures of the 11042 tooth. High probability nerve voxels 11021 (for example, a probability of 95% or more) is used by an adjustment algorithm in order to reconstruct a 3D model of nerve structures
[0119] [0119] While the figures represent 3Ds deep neural networks as separate neural networks, in which each neural network has a certain function, for example, pre-processing, classification and segmentation and post-processing, these neural networks can also be connected each other, forming one or two deep neural networks that include the desired functionality. In this case, different neural networks can be trained separately (for example, described with reference to the figures in this disclosure). Thereafter, the trained networks can be connected to each other, forming a deep neural network.
[0120] [0120] Figure 12 is a block diagram that illustrates exemplary data processing systems described in this disclosure. Data processing system 1200 can include at least one processor 1202 coupled to memory elements 1204 via a bus system 1206. As such, the data processing system can store a program code within memory elements 1204. In addition , processor 1202 can execute program code accessed from memory elements 1204 via a bus system 1206. In one aspect, the data processing system can be implemented as a computer that is suitable for storing and / or execute program code. It should be noted, however, that data processing system 1200 can be implemented in the form of any system, including a processor and memory that are capable of performing the functions described in that specification.
[0121] [0121] Memory elements 1204 can include one or more physical memory devices, such as, for example, local memory 1208 and one or more mass storage devices 1210. Local memory can refer to random access memory or other storage devices non-persistent memory, generally used during the actual execution of the program code. A mass storage device can be implemented as a hard drive or a persistent data storage device. Processing system 1200 may also include one or more cache memories (not shown) that provide temporary storage of at least one program code in order to reduce the number of times that the program code must be retrieved from the storage device in mass 1210 during execution.
[0122] [0122] Input / output (I / O) devices represented as input device 1212 and output device 1214 can optionally be coupled to the data processing system. Examples of input devices may include, but are not limited to, for example, a keyboard, a pointing device, such as a mouse, or the like. Examples of output devices may include,
[0123] [0123] As shown in FIG. 12, memory elements 1204 can store an application 1218. It should be noted that data processing system 1200 can additionally run an operating system (not shown) that can facilitate order execution. The order, which is implemented in the form of an executable program code, can be executed by a data processing system 1200, for example, by processor 1202. Responsive to the execution of the order, data processing system can be configured to execute one or more operations to be described in this document in more detail.
[0124] [0124] In one aspect, for example, data processing system 1200 can represent a customer data processing system. In that case, order 1218 may represent a customer order which, when executed, configures data processing system 1200 to perform the various functions described in this document with reference to a "customer". Examples of a customer may include, but are not limited to, a personal computer, a laptop, a mobile phone, or the like.
[0125] [0125] In another aspect, data processing system can represent a server. For example, data processing system can represent a server (HTTP) in which case, 1218, when run, can configure data processing system to perform (HTTP) server operations. In another aspect, a data processing system can represent a module, unit or function, as referred to in this specification.
[0126] [0126] The terminology used in this document is for the purpose of describing particular modalities only and is not intended to limit the invention. As used herein, the singular forms "one," "one," and "a" are intended to include plural forms as well, unless the context clearly indicates otherwise. It will also be understood that the terms "comprises" and / or "comprising," when used in this specification, specify the presence of resources, integers, steps, operations, elements, and / or components, but do not prevent the presence or addition of a or more other resources, integers, steps, operations, elements, components, and / or groups thereof.
[0127] [0127] The corresponding structures, materials, activity, and equivalents of all means or elements of the step and function in the claims below, are intended to include any structure, material, or activity to perform the function in combination with other claimed elements, as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the disclosed form. Many modifications and variations will be evident to those skilled in the art without departing from the scope and spirit of the invention. The modality was chosen and described in order to better explain the principles of the invention and the practical application, and to allow others skilled in the art to understand the invention for various modalities with various modifications as are suitable for the particular use contemplated.

权利要求:
Claims (15)
[1]
1. Computer-implemented method to process 3D data representing a dentomaxillofacial structure characterized by comprising: a computer that receives 3D input data, preferably 3D cone beam CT (CBCT) data, 3D input data including a first representation voxel of the dentomaxillofacial structure, a voxel being associated with a radiation intensity value, in which the voxels of the voxel representation define an image volume; a pre-processing algorithm using 3D input data to determine one or more 3D positional features of the dentomaxillofacial structure, a 3D position feature that defines information about voxel positions of the first voxel representation in relation to the position of a plane dental reference, for example, an axial plane positioned in relation to a mandible, or the position of a dental reference object, for example, a mandible, a dental arch and / or one or more teeth, in the image volume; where the computer provides the first voxel representation and the one or more 3D positional features associated with the first voxel representation at the entrance of a first 3D deep neural network, preferably a 3D convolutional deep neural network, where the first deep neural network is configured to classify voxels of the first voxel representation into at least jaw, teeth and / or nerve voxels; the first neural network is trained based on a training set, in which the training set includes 3D image data of dentomaxillofacial structures, one or more 3D positional resources derived from the 3D image data of the training set and, optionally, a or more 3D models of parts of the dentomaxillofacial structures of the 3D image data of the training set, the one or more 3D models used as a target during the training of the first deep neural network, preferably at least part of the one or more 3D models generated by scanning it if optically parts of the dentomaxillofacial structures of the 3D image data of the training set; wherein the computer receives classified voxels from the first voxel representation of the output of the first 3D deep neural network and determines a voxel representation of at least one of the jaw, teeth and / or nervous tissue of the dentomaxillofacial structure based on the classified voxels.
[2]
2. Method according to claim 1, characterized by a pre-processing algorithm that determines one or more 3D positional features include: determining a distance between a voxel of the voxel representation and a dental reference plane and / or an object dental reference, in the image volume; determine accumulated intensity values of voxels at one or more points of a reference plane of the image volume, an accumulated intensity value at a point in the reference plane that includes the accumulated intensity values of voxels at or near the normal path through the point on the reference plane.
[3]
Method according to claim 2, characterized in that the dental reference plane includes an axial plane in the image volume positioned at the predetermined distance of the upper and / or lower mandible of the dentomaxillofacial structure; preferably at a distance approximately equal to an upper and lower jaw of the dentomaxillofacial structure; or, in which the dental reference object includes a dental arch curve that approaches at least part of a dental arch, as represented by the dentomaxillofacial structure, in which the dental arch curve is preferably determined in an axial plane of the volume of Image; and / or, where the dental reference object includes one or more teeth.
[4]
4. Method according to claim 1, characterized in that the pre-processing algorithm includes a second 3D deep neural network the second deep neural network is trained to receive a second voxel representation at its input, and to determine for each voxel of the second voxel representation is a 3D position feature, preferably the 3D position feature including a measurement indicating a probability that a voxel represents the jaw, teeth and / or nervous tissue, where the second voxel representation is a version low resolution of the first voxel representation, preferably the resolution of the second voxel representation being at least three times lower than the resolution of the first voxel presentation, preferably the second deep 3D neural network trained based on 3D image data from dentomaxillofacial structures and, optionally, one or more 3D models of parts of the dentomaxillofacial structures of the 3D image data of the training set to train the first deep neural network.
[5]
5. Method according to claim 1, characterized in that the provision of the first voxel representation and one or more 3D positional resources associated with the first voxel representation at the entrance of a first 3D deep neural network further comprises: associating each voxel of the first voxel representation to at least information defined by a 3D position feature; dividing the first voxel representation into first blocks of voxels; providing a first block of voxels at the entrance of the first deep neural network in which each voxel of the first block of voxels is associated with a radiation intensity value and at least information defined by a 3D position feature.
[6]
Method according to any one of claims 1 to 5, characterized in that the first deep neural network comprises a plurality of 3D convolutional first layers, wherein the output of the plurality of 3D convolutional first layers is connected to at least one layer completely connected, in which the plurality of first 3D convolutional layers is configured to process a first block of voxels of the first voxel representation and in which at least one completely connected layer is configured to classify voxels of the first block of voxels in at least one among jaws, teeth and / or nerve voxels, preferably each voxel supplied at the entrance to the first deep neural network comprising a radiation intensity value and at least one 3D position feature.
[7]
Method according to claim 6, characterized in that the first deep neural network further comprises a plurality of second 3D convolutional layers, the output of the plurality of second 3D convolutional layers is connected to at least one completely connected layer, wherein the plurality of second convolutional 3D layers is configured to process a second voxel block of the first voxel representation, wherein the first and second voxel blocks have the same or substantially the same central point in the image volume and the second voxel block represents a volume in dimensions of the real world that is greater than the volume in dimensions of the real world of the first block of voxels, in which the plurality of second convolutional 3D layers is configured to determine contextual information associated with voxels of the first block of voxels that is provided at the entrance of the plurality of first 3D convolutional layers.
[8]
Method according to claims 6 or 7, characterized in that the first deep neural network additionally comprises a plurality of 3D convolutional third layers, the output of the plurality of 3D convolutional third layers connected to at least one completely connected layer, wherein the plurality of third 3D convolutional layers is configured to process one or more 3D positional resources associated with voxels of at least the first block of voxels that is provided at the input of the plurality of first 3D convolutional layers.
[9]
9. Method according to any of claims 1 to 8, characterized in that it further comprises: a third deep neural network for post-processing of voxels classified the first deep neural network, the third deep neural network trained to receive voxels that are classified by the first deep neural network at its entrance and to correct voxels that are incorrectly classified by the first deep neural network, where preferably the third neural network is trained based on voxels that are classified during the training of the first deep neural network at inputs, optionally, based on one or more 3D models of parts of the dentomaxillofacial structures of the 3D image data of the training set as a target.
[10]
10. Computer-implemented method to train a deep neural network system to process 3D image data from a dentomaxillofacial structure characterized by comprising: a computer that receives training data, training data including: 3D input data, preferably data from 3D cone beam CT (CBCT) image, in which the 3D input data define one or more voxel representations of one or more dentomaxillofacial structures, respectively, in which a voxel is associated with a radiation intensity value, in that the voxels of a voxel representation define an image volume; optionally, the training data additionally including: 3D models of parts of the dentomaxillofacial structures represented by the 3D input data of the training data; the computer using a preprocessing algorithm to preprocess one or more voxel representations of one or more dentomaxylofacial structures respectively to determine one or more 3D positional features for voxels in one or more voxel representations, a feature 3D position definition that defines information about a position of at least one voxel of a voxel representation of dentomaxillofacial structures in relation to the position of a dental reference plane (for example, an axial plane positioned in relation to a mandible) or the position of a dental reference object (for example, a mandible, a dental arch and / or one or more teeth) in the image volume; and, using training data and one or more 3D positional resources to train the first deep neural network to classify voxels into jaw, teeth and / or nerve voxels.
[11]
11. Method, according to claim 10, characterized by additionally understanding: with the use of voxels that are classified during the training of the first deep neural network and one or more 3D models of parts of the dentomaxillofacial structures of the 3D image data of the training set to train a third neural network to post-process voxels classified by the first deep neural network, where post-processing by the third neural network includes correcting voxels that are incorrectly classified by the first deep neural network.
[12]
12. Computer system adapted to process 3D image data of a dentomaxillofacial structure characterized by comprising: a computer-readable storage medium that has computer-readable program code incorporated in it, in which the computer-readable program code includes a pre-processing algorithm and a first deep neural network; and a processor, preferably a microprocessor, coupled to the computer-readable storage medium, in which it is responsive to the execution of the computer-readable program code, the processor is configured to perform executable operations comprising: receiving 3D input data, preferably data of 3D cone beam CT (CBCT), the 3D input data including a first voxel representation of the dentomaxillofacial structure, a voxel being associated with a radiation intensity value, in which the voxels of the voxel representation define a volume of Image; a pre-processing algorithm using 3D input data to determine one or more 3D positional features of the dentomaxillofacial structure, a 3D position feature that defines information about voxel positions of the first voxel representation in relation to the position of a plane dental reference, for example, an axial plane positioned in relation to a mandible, or the position of a dental reference object, for example, a mandible, a dental arch and / or one or more teeth, in the image volume; provide the first voxel representation and one or more 3D positional features associated with the first voxel representation at the entrance of a first 3D deep neural network, preferably a 3D convolutional deep neural network, in which the first deep neural network is configured to classify voxels the first representation of voxel in at least jaws, teeth and / or nerve voxels; the first neural network is trained based on a training set, in which the training set includes 3D image data of dentomaxillofacial structures, one or more 3D positional resources derived from the 3D image data of the training set and, optionally, a or more 3D models of parts of the dentomaxillofacial structures of the 3D image data of the training set, the one or more 3D models used as a target during the training of the first deep neural network, preferably at least part of the one or more 3D models generated by scanning it if optically parts of the dentomaxillofacial structures of the 3D image data of the training set; receiving classified voxels from the first voxel representation of the output of the first 3D deep neural network and determines a voxel representation of at least one of the jaw, teeth and / or nervous tissue of the dentomaxillofacial structure based on the classified voxels.
[13]
13. Computer system according to claim 12, characterized in that the pre-processing algorithm includes a second 3D deep neural network, the second deep neural network is trained to receive a second voxel representation at its input, and, to determine for each voxel of the second voxel representation a 3D position feature, preferably the 3D position feature including a measurement indicating a probability that a voxel represents the jaw, teeth and / or nervous tissue, where the second voxel representation is a low resolution version of the first voxel representation, preferably the resolution of the second voxel representation being at least three times lower than the resolution of the first voxel presentation, preferably the second 3D deep neural network trained based on the image data 3D of dentomaxillofacial structures and, optionally, the one or more 3D models of parts of the dentomaxillofacial structures of the d 3D image of the training set to train the first deep neural network.
[14]
14. Computer system according to claim 13, characterized in that the first deep neural network comprises: a plurality of first 3D convolutional layers, wherein the output of the plurality of first 3D convolutional layers is connected to at least one completely connected layer , in which the plurality of first 3D convolutional layers is configured to process a first voxel block of the first voxel representation and in which at least one completely connected layer is configured to classify voxels of the first voxel block into at least one among voxels jaw, teeth and / or nerve, preferably each voxel supplied at the entrance to the first deep neural network comprising a radiation intensity value and at least one 3D position feature; and, optionally, the first deep neural network further comprises: a plurality of second 3D convolutional layers, the output of the plurality of second 3D convolutional layers is connected to at least one completely connected layer, wherein the plurality of second 3D convolutional layers is configured to process a second voxel block of the first voxel representation, where the first and second voxel blocks have the same or substantially the same central point in the image volume and the second voxel block represents a volume in real-world dimensions which is greater than the volume in real-world dimensions of the first block of voxels, where the plurality of second convolutional 3D layers is configured to determine contextual information associated with voxels of the first block of voxels that is provided at the input of the plurality of first 3D convolutional layers.
[15]
15. Computer program product characterized by comprising portions of software code configured to, when executed in the memory of a computer, perform the method steps according to any one of claims 1 to 11.

类似技术:

公开号 | 公开日 | 专利标题

BR112019028132A2|2020-07-28|computer system adapted to process 3d image data and its method, product of computer program

US20200320685A1|2020-10-08|Automated classification and taxonomy of 3d teeth data using deep learning methods

CN107798682B|2021-01-29|Image segmentation system, method, apparatus and computer-readable storage medium

EP3785231A1|2021-03-03|Image enhancement using generative adversarial networks

BR112020012292A2|2020-11-24|automated prediction of 3d root format using deep learning methods

EP2715663B1|2019-01-16|Apparatus for generating assignments between image regions of an image and element classes

CN113240719A|2021-08-10|Method and system for characterizing anatomical features in medical images

EP3410393A1|2018-12-05|Comparing medical images

WO2020238817A1|2020-12-03|Systems and methods for processing x-ray images

EP3591616A1|2020-01-08|Automated determination of a canonical pose of a 3d dental structure and superimposition of 3d dental structures using deep learning

US20210183054A1|2021-06-17|Systems and methods for machine learning based automatic bullseye plot generation

KR20210005649A|2021-01-14|Automatic correction of voxel representation of metal-affected x-ray data using deep learning technology

JP2019536538A|2019-12-19|Bone and hard plaque segmentation in spectral CT

WO2020246996A1|2020-12-10|Sct image generation using cyclegan with deformable layers

同族专利:

公开号 | 公开日

EP3449421A1|2019-03-06|

EP3449421B1|2020-04-29|

IL271743D0|2020-02-27|

WO2019002631A1|2019-01-03|

US20210150702A1|2021-05-20|

CN110998602A|2020-04-10|

JP2020525258A|2020-08-27|

CA3068526A1|2019-01-03|

KR20200035265A|2020-04-02|

ES2807263T3|2021-02-22|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

DK2560572T3|2010-04-20|2019-08-26|Dental Imaging Technologies Corp|REDUCTION AND REMOVAL OF ARTIFACTS FROM A THREE-DIMENSIONAL DENTAL X-RAY DATA SET USING SURFACE SCANNING INFORMATION|

US20140227655A1|2013-02-12|2014-08-14|Ormco Corporation|Integration of model data, surface data, and volumetric data|

EP3335158B1|2015-08-15|2019-07-24|Salesforce.com, Inc.|Three-dimensional convolution with 3d batch normalization|US11270523B2|2017-11-29|2022-03-08|Sdc U.S. Smilepay Spv|Systems and methods for constructing a three-dimensional model from two-dimensional images|

KR102215068B1|2019-02-18|2021-02-10|부산대학교 산학협력단|Apparatus and Method for Registrating Implant Diagnosis Image|

CN109766877A|2019-03-12|2019-05-17|北京羽医甘蓝信息技术有限公司|The method and apparatus of whole scenery piece artificial tooth body identification based on deep learning|

WO2020235784A1|2019-05-22|2020-11-26|주식회사 디오|Nerve detection method and device|

KR102236973B1|2019-05-22|2021-04-07|주식회사 디오|Method and Apparatus for detecting of Nerve in Dental Image|

KR102205427B1|2019-06-26|2021-01-20|주식회사 디오|Method and apparatus for correcting nerve position in dental image|

EP3968277A1|2019-10-07|2022-03-16|Nihon University|Segmentation device|

US20210104048A1|2019-10-07|2021-04-08|J. Morita Manufacturing Corporation|Segmentation device and method of generating learning model|

US10916053B1|2019-11-26|2021-02-09|Sdc U.S. Smilepay Spv|Systems and methods for constructing a three-dimensional model from two-dimensional images|

EP3828828A1|2019-11-28|2021-06-02|Robovision|Improved physical object handling based on deep learning|

CN113538606A|2021-08-17|2021-10-22|数坤网络科技股份有限公司|Image association method, linkage display method and related product|

法律状态:
2021-11-03| B350| Update of information on the portal [chapter 15.35 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

EP17179185.8|2017-06-30|

EP17179185|2017-06-30|

PCT/EP2018/067850|WO2019002631A1|2017-06-30|2018-07-02|Classification and 3d modelling of 3d dento-maxillofacial structures using deep learning methods|

[返回顶部]